A grid-based approach for enterprise-scale data mining
نویسندگان
چکیده
— We describe a grid-based approach for enterprise-scale data mining that leverages database technology for I/O parallelism, and on-demand compute servers for compute parallelism in the statistical computations. By enterprise-scale, we mean the highly-automated use of data mining in vertical business applications, where the data is stored on one or more relational database systems, and where a distributed architecture comprising of high-performance compute servers or a network of low-cost, commodity processors is used to improve application performance and provide the application deployment flexibility for overall workload management. The approach relies on an algorithmic decomposition of the data mining kernel on the data and compute grids, which makes it possible to exploit the parallelism on the respective grids in a simple way, while minimizing the data transfer between them. The overall approach is compatible with existing database standards for data mining task specification and results reporting, and hence external applications using these standards-based interfaces do not have to be modified in order to realize the benefits of this grid-based approach. D ata-mining technologies that automate the generation and application of statistical models from data are of interest in a variety of applications cutting across industry sectors. These applications include, for example, customer relationship management (Retail, Banking and Finance, Telecom), fraud detection (Banking and Finance, Telecom), lead generation for marketing and sales (Insurance, Retail), clinical data analysis (Health Care), risk modeling and management (Banking and Finance, Insurance), process modeling and quality control (Manufacturing), genomic data and micro-array analysis (Life Sciences), yield management and logistics (Travel and Transportation), text classification and categorization (cross-Industry) among others. Further details, including specific case-studies for some of these applications can be found in [1]. In general, the underlying statistical analysis (for predictive modeling, forecasting, optimization, or exploratory multivariate data analysis) in these business applications is very computationally intensive. Our grid-based approach is motivated by some of the requirements and challenges for developing enterprise-scale data mining solutions for these applications. By enterprise-scale, we mean the use of data mining as a tightly integrated component in the workflow of vertical business applications, with the relevant data being stored on highly-available, secure, commercial relational database systems. These two aspects of the present problem differentiate the present work from other data-intensive problems studied in the data grid and scientific computing literature (e.g., [2], [3]). The outline of the remainder of the paper is as follows. Section I considers the current state …
منابع مشابه
Multi-agent Web Text Mining on the Grid for Enterprise Decision Support
In this study, a multi-agent web text mining system on the grid is developed to support enterprise decision-making. First, an individual intelligent learning agent that learns about underlying text documents is presented to discover the useful knowledge for enterprise decision. In order to scale the individual intelligent agent with the large number of text documents on the web, we then provide...
متن کاملAssociation Rules Extraction from Big Data Using BOINC-based Enterprise Desktop Grid
The paper describes an approach to association rules extraction from huge data sets using BOINC-based Enterprise Desktop Grid. An algorithm of data analysis and a native BOINC-based application are presented. Several experiments with the aim of validation and performance evaluation of the algorithm implementation were performed. The results of the experiments show that the approach is promising...
متن کاملInternet Intelligent Platform-AGrIP
In information age, both governments and enterprises need intelligent platform to provide dynamic, real-time intelligent information processing service. This paper describes an Agent Grid based Intelligent Platform AGrIP to support service layer information processing. AGrIP integrates a comprehensive set of tools to compose a whole platform that will enable straightforward management of large-...
متن کاملCredit scoring in banks and financial institutions via data mining techniques: A literature review
This paper presents a comprehensive review of the works done, during the 2000–2012, in the application of data mining techniques in Credit scoring. Yet there isn’t any literature in the field of data mining applications in credit scoring. Using a novel research approach, this paper investigates academic and systematic literature review and includes all of the journals in the Science direct onli...
متن کاملCommodity-Grid Based Distributed Pattern Recognition Framework
Large-scale pattern recognition for data mining requires significant processing resources. Distributed pattern recognition provides an avenue for achieving large-scale pattern recognition by using a state-of-theart data classifier for fast tracking large-scale data analyses. In this paper, we will introduce a framework for distributed pattern recognition which is grid enabled and employs a dist...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Future Generation Comp. Syst.
دوره 23 شماره
صفحات -
تاریخ انتشار 2007